FAQ Mining Via List Detection
نویسندگان
چکیده
This paper presents an approach to FAQ mining via a list detection algorithm. List detection is very important for data collection since list has been widely used for representing data and information on the Web. By analyzing the rendering of FAQs on the Web, we found a fact that all FAQs are always fully/partially represented in a list-like form. There are two ways to author a list on the Web. One is to use some specific tags, e.g. tag for HTML. The lists authored in this way can be easily detected by parsing those special tags. Another way uses other tags instead of the special tags. Unfortunately, many lists are authored in the second way. To detect lists, therefore, we present an algorithm, which is independent of Web languages. By combining the algorithm with some domain knowledge, we detect and collect FAQs from the Web. The mining task achieved a performance of 72.54% recall and 80.16% precision rates.
منابع مشابه
Latent Semantic Inference for Agriculture FAQ Retrieval
FAQ system can make user find answer to the problem that puzzles them. But now the research on Chinese FAQ system is still on the theoretical stage. This paper presents an approach to semantic inference for FAQ mining. To enhance the efficiency, a small pool of the candidate question-answering pairs retrieved from the system for the follow-up work according to the concept of the agriculture dom...
متن کاملWeb-Based Communication Strategies Designed to Improve Intention to Minimize Risk for Colorectal Cancer: Randomized Controlled Trial
BACKGROUND People seek information on the Web for managing their colorectal cancer (CRC) risk but retrieve much personally irrelevant material. Targeting information pertinent to this cohort via a frequently asked question (FAQ) format could improve outcomes. OBJECTIVE We identified and prioritized colorectal cancer information for men and women aged 35 to 74 years (study 1) and built a websi...
متن کاملThe Viewpoints FAQ
The structure of this brief paper follows an emerging convention the FAQ Frequently Asked Questions list. FAQs have grown out of Internet newgroups where participants, tired of seeing the same questions repeated by newcomers, provide a list of canned answers to the most frequently asked questions. An FAQ also provides a covert role in defusing tiresome or unduly acrimonious debates by summarisi...
متن کاملTemplate-Based Information Mining from HTML Documents
Tools for mining information from data can create added value for the Iqternet. As the majority of electronic documents available over the network are in unstructured textual form, extracting useful information from a document usually involves information retrieval techniques or manual processing. This paper presents a novel approach to mining information from HTML documents using tree-structur...
متن کاملFAQ: A Framework for Fast Approximate Query Processing on Temporal Data
Temporal queries on time evolving data are at the heart of a broad range of business and network intelligence applications ranging from consumer behavior analysis, trend analysis, temporal pattern mining, sentiment analysis on social media, cyber security, and network monitoring. In this work, we present an innovative data structure called Fast Approximate Query-able(FAQ), which provides a unif...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002